Proteogenomics in cerebrospinal fluid and plasma reveals new biological fingerprint of cerebral small vessel disease

Cerebral small vessel disease (cSVD) is a leading cause of stroke and dementia with no specific mechanism-based treatment. We used Mendelian randomization to combine a unique cerebrospinal fluid (CSF) and plasma pQTL resource with the latest European-ancestry GWAS of MRI-markers of cSVD (white matter hyperintensities, perivascular spaces). We describe a new biological fingerprint of 49 protein-cSVD associations, predominantly in the CSF. We implemented a multipronged follow-up, across fluids, platforms, and ancestries (Europeans and East-Asian), including testing associations of direct plasma protein measurements with MRI-cSVD. We highlight 16 proteins robustly associated in both CSF and plasma, with 24/4 proteins identified in CSF/plasma only. cSVD-proteins were enriched in extracellular matrix and immune response pathways, and in genes enriched in microglia and specific microglial states (integration with single-nucleus RNA sequencing). Immune-related proteins were associated with MRI-cSVD already at age twenty. Half of cSVD-proteins were associated with stroke, dementia, or both, and seven cSVD-proteins are targets for known drugs (used for other indications in directions compatible with beneficial therapeutic effects. This first cSVD proteogenomic signature opens new avenues for biomarker and therapeutic developments.

of the UKRI Medical Research Council Neuroscience and Mental Health Board until March 2024.He acknowledges consultancy fees from, Biogen, Sudo.Nimbus and GSK.He has received speakers' honoraria from Sano and Redburn, and has received research or educational funds from Biogen, Merck, Bristol Meyers Squibb and Nimbus.J.W declares no commercial COI; various academic research grants and is CI for LACunar Intervention Trials.

Introduction
Characterized by changes in the structure and function of small brain vessels, cerebral small vessel disease (cSVD) is a leading cause of ischemic and hemorrhagic stroke, cognitive decline and dementia.cSVD is extremely common with increasing age and most often covert, namely detectable on brain imaging in the absence of clinical symptoms.Covert cSVD portends a considerably increased risk of stroke and dementia, thus represents a major target to prevent these disabling conditions and promote healthier brain aging 1 .The most common and heritable MRI-markers of cSVD (MRI-cSVD) are white matter hyperintensities of presumed vascular origin (WMH) and perivascular spaces (PVS) 2 .
Hypertension is the strongest known risk factor for cSVD, representing a major target for prevention 1 .
However, vascular risk factors explain only a small fraction of MRI-cSVD variability in older age 3 , and drugs speci cally targeting pathological processes underlying cSVD are lacking.Genomics can provide a strong foundation for mechanistic studies and drug target discovery 4 .Recent genetic studies have identi ed > 70 genetic risk loci associated with cSVD 5,6 ; however, causal genes and underlying molecular pathways remain poorly understood.
As disease occurrence re ects the complex interplay of factors beyond DNA sequence, there is growing interest in identifying circulating biomarkers, such as proteins, capturing these downstream factors, to enhance our understanding of the underlying biology, accelerate omics-driven drug discovery, and potentially generate circulating biomarkers for clinical use 7 .While large-scale proteomic investigations have recently been conducted for stroke and dementia, with promising ndings, [7][8][9][10][11][12][13] studies on proteomics of cSVD have been conducted on limited sets of proteins, in small studies of European ancestry (N < 5,000), and in plasma only [14][15][16][17][18] .We hypothesize that, while plasma may enable easyaccess biomarker measurements, CSF, the uid circulating in perivascular spaces, could reveal a more accurate biological ngerprint of cSVD.
Here we used two-sample Mendelian randomization (2SMR), leveraging large proteomic and genomic resources, to investigate the relation of circulating protein levels in CSF and plasma with WMH and PVS burden and to explore its causal relation and directionality.We further used a multipronged approach for the follow-up of identi ed associations in independent samples, across uids, proteomics platforms, ancestries and the lifespan, using both 2SMR and individual-level data.We also explored the ability of proteogenomics to predict extensive cSVD and tested the relation of cSVD-associated proteins with risk of stroke and Alzheimer's disease (AD).Using single-cell sequencing resources we deciphered cell-types and pathways involved.Finally, we combined our results with pharmacological databases for proteomics-driven drug discovery.

Results
The study design is summarized in Fig. 1.
In secondary analyses including both cisand trans-pQTLs, we found 340 proteins associated with at least one MRI-cSVD in CSF or plasma (p FDR <0.05), of which 176 were driven by two trans-hotspots at APOE (147 proteins) and chr16q24 (29 proteins).Although most protein-cSVD associations revealed novel pathways not previously identi ed, some relate to previous cSVD GWAS ndings.Two cis-pQTL were associated with WMH volume at genome-wide signi cance, for FBLN3 (encoded by EFEMP1) at chr2p16 and NMT1 (NMT1) at chr17q21.Additionally, HTRA1, of which lower genetically determined plasma levels were associated with extensive HIP-PVS, is encoded by a gene harboring both rare mutations causing monogenic cSVD 21 and common variants associated with small vessel stroke and suggestively WMH 20,22 .From secondary analyses, eight trans-pQTL for 29 proteins at the chr16q24 hotspot were associated at genome-wide signi cance with WMH.The APOE hotspot included four proteins encoded by genes in genome-wide or gene-wide signi cant risk loci for WMH 20 and extreme-cSVD 23 (APOE, MRPL38, SULT1B1, and MSRA; Supplementary Tables 6-9, Extended Data Fig. 1).
To assess the independence of observed associations, we used LD-score regression (LDSC) 24 to quantify the genetic correlation between protein levels.Only one genetic correlation was signi cant after multiple testing correction (EPHB4 with PILRA-M14 in plasma at p < 5x10 − 5 , Methods, Extended Data Fig. 2).Several protein-protein interactions were identi ed using the STRING database (Fig. 2F).

Follow-up of signi cant protein-cSVD associations
We used a multi-pronged approach to follow-up protein-cSVD associations based on cis-pQTL with signi cant MR results and colocalization evidence, across uids, platforms, and ancestries (Figs. 1 and  3).
First, using 2SMR, we tested whether cSVD-proteins associations observed in CSF showed some indication of association in plasma, and vice-versa, with a less stringent multiple testing correction than in the discovery analysis, considering signi cant associations in the original uid only.Thirty-seven cSVD-associated CSF proteins had plasma pQTL available.Nine of these (24%) were associated with the same MRI-cSVD phenotype in plasma at p FDR <0.05 (APOE, ARSB, EPO, AMD, CTSS, PSMP with WMH, PILRA-M14, PILRA-deltaTM, KTEL1 with WM-PVS, Methods, Fig. 3A and 6, Supplementary Table 10).Six cSVD-associated plasma proteins had CSF cis-pQTL available.Four of these (67%) were associated with the same MRI-cSVD phenotype in CSF at p FDR <0.05 (AMD, EPO with WMH and PILRA-M14, PILRA-deltaTM with WM-PVS, Fig. 3B and 6, Supplementary Table 11).Directions of association were mostly concordant except for EPO, APOE and PSMP, which showed opposite direction of association in CSF and plasma, in line with previous observations 12 and highlighting the importance of studying multiple tissues to capture the complexity of underlying biology.
Second, a cross-platform follow-up was performed by testing the association with MRI-cSVD of plasma protein levels measured on the Olink Explore-3072 platform in two independent population-based studies, 3C-Dijon (N = 1,056; mean age 72.5 years) and UK Biobank (N = 5,494; mean age 63.5 years, Supplementary Table 12).Twenty-nine of the 49 cSVD-associated proteins (59%) were available; 26 were used after quality control and their plasma level was tested against WMH volume and PVS burden using linear regression followed by inverse variance weighted meta-analysis (N = 6,550).Of these, 7 proteins (27%, all identi ed in CSF 2SMR), showed association with the same MRI-cSVD marker at p FDR <0.05 (ARSB, PRSS8, CTSS, CTSB, TFPI and BT3A2 with WMH, IL-6 with HIP-PVS, Figs. 3 and 6, Supplementary Tables 13-14).Directionality of association with MRI-cSVD was inconsistent between CSF pQTL and plasma protein levels for PRSS8, TFPI, IL-6, and CTSS.Inter-platform correlations for these proteins between Somascan and Olink were moderate to good in plasma and CSF respectively (Supplementary Table 15 25 ); however correlations were not available between plasma and CSF.
Third, we conducted a cross-ancestry exploratory follow-up, testing associations of MRI-cSVD with plasma protein levels measured on the Somascan 4K platform in the Japanese population-based Nagahama study (N = 785; mean age 68 years).Thirty-eight of the 49 cSVD-associated proteins (77%) were available and their plasma level was tested against WMH volume and extensive PVS burden.Two proteins (both identi ed in CSF 2SMR in Europeans) were associated at p FDR <0.05 with the same MRImarker (WM-PVS), with consistent directionality (ERO1B and PCSK9); given the small sample size we also considered nominally signi cant associations, observed for four additional proteins, with WMH (BT2A1, CTSB, TNC, PSMP, Figs. 3 and 6, Supplementary Table 16).
Fourth, we took an exploratory lifespan approach by testing the relation of cSVD-associated proteins with MRI-cSVD in young adults (i-Share study, N = 1,748; mean age 22.1 years).Here we used 2SMR with the same cis-pQTL as for discovery analyses.Consistent with ndings in older adults, higher genetically determined CSF levels of PILRA-M14, PILRA-deltaTM were associated with larger WMH volume at p FDR <0.05.In addition, higher genetically determined CSF protein levels of GPNMB:CD and GPNMB:ECD (cellular and extracellular domain of a transmembrane glycoprotein upregulated upon tissue damage and in ammation) and TLR1:ECD (extracellular domain of toll-like receptor 1, which plays a fundamental role in activation of innate immunity) were associated with BG-PVS and WMH volume respectively at p < 0.05, in a direction consistent with older adults (Fig. 3, Supplementary Table 17).
Overall, of 49 cSVD-associated proteins (Supplementary Table 18, Fig. 6), (i) 16 CSF proteins showed associations with the same MRI-cSVD marker in plasma in at least one analysis (pQTL or direct protein measurement) at p FDR <0.05, with consistent directionality in 63%; (ii) 24 CSF proteins were not associated with the same MRI-marker in plasma (p ≥ 0.05) and may be considered as CSF-speci c; (iii) 4 proteins were identi ed in plasma pQTL analyses only, with non-signi cant follow-up in association with direct plasma protein measurements; (iv) 5 proteins had no follow-up available apart from the lifespan exploration; (iii) and 6 proteins had evidence for lifespan effects at p < 0.05 (2 at p FDR <0.05).

Predictive performance of protein genetic risk scores (GRS)
We assessed the ability of selected cis-based protein-GRS to predict a composite extreme-cSVD phenotype (extensive WMH volume ± lacunes vs. minimal WMH volume without lacunes) in the 3C-Dijon cohort, benchmarking it against a previously validated WMH-GRS 20 (Methods).Using the WMH-GRS only, we achieved an AUC of 0.568 (95% Bootstrap CI 0.501-0.634).Adding any of the four selected protein GRS slightly improved the AUC, while adding them all achieved a maximum improvement of + 0.04 (AUC = 0.608; 95% CI [0.544-0.672],Extended Data Fig. 4 Supplementary Table 19).

Clinical signi cance
We explored the relation of the 49 cSVD-associated proteins with stroke (any, ischemic, and small vessel stroke; intracerebral hemorrhage) and AD (Methods).We leveraged the aforementioned CSF and plasma pQTL, as well as European-ancestry summary statistics of GWAS for stroke and its subtypes (N ≤ 73,652 cases) and AD (N = 71,880 cases).Twenty-four proteins (49%) showed associations with at least one clinical outcome at p < 0.05 (Figs. 4 and 6).At p FDR <0.05, eight CSF proteins (APOE, PILRA-M14, PILRA-deltaTM, FcRIIIa, BGAT, PLA2R, TIMD3 and TPSNR) and four plasma proteins (EphB4, HTRA1, PILRA-M14, PILRA-deltaTM) were signi cantly associated with AD, while one CSF protein (BGAT, measuring histo-blood group ABO system glycosyltransferase activity) and one plasma protein (FBLN3) were associated with any stroke and ischemic stroke (Supplementary Tables 20-21).Nineteen of 49 proteins were available for partial follow-up in plasma using 2SMR in East-Asian participants in relation with ischemic and small vessel stroke, leveraging plasma pQTL from Biobank Japan (N = 2,886) and an East-Asian stroke GWAS meta-analysis (N ≤ 17,493).Overall, despite substantially smaller sample size for exposure and outcome in East-Asians, correlation of effect sizes was moderate to high (Extended Data Fig. 5).Higher plasma levels of NovH (encoded by CCN3), an ECM associated protein involved in cardiovascular development, were associated with increased risk of small vessel stroke at p FDR <0.05 (Supplementary Table 22).
To explore enrichment of observed protein-cSVD associations in particular cell-types we rst conducted single-cell enrichment analyses using STEAP, leveraging multiple publicly available single-cell sequencing resources (Methods, Supplementary Table 24).Genes encoding several cSVD-associated proteins showed signi cant enrichment in microglia for several CSF proteins (BT2A1, BT3A2, BT3A3, CTSS, HIBCH) and in immune cells for plasma protein (EPO, Supplementary Table 25, Extended data Fig. 6).Next, we used unique resources of single nucleus RNA sequencing (snRNAseq) derived from up to 443 post-mortem brain samples (dorsolateral prefrontal cortex) from the ROSMAP older populationbased cohort [26][27][28][29] .In silico sorting of human cortical tissue samples was used to derive vascular brain cells 27,28 .From these snRNAseq resources we could derive cell-type speci c brain eQTLs for 19 and 10 genes encoding cSVD-associated proteins, in non-vascular and vascular cells respectively (Methods).Using MR, we found lower genetically determined expression levels of TLR1 in oligodendrocytes (p FDR =2.24x10 − 4 ) and CTSS in smooth muscle cells (p FDR =2.3x10 − 3 ) to be associated with larger WMH volume, both consistent with directionality of associations in CSF (Supplementary Table 26-27).Higher genetically determined expression of ABO (encoding BGAT) in pericytes was protective for extensive WM-PVS (p FDR =2.3x10 − 3 , opposite direction compared to CSF).All three associations showed evidence for colocalization (PP.H4 > 0.7).Genes encoding cSVD-associated proteins showed distinct cerebrovascular cell-speci c gene expression patterns (e.g. with EFEMP1 expression dominating in a new subtype of perivascular broblasts) and we observed a non-signi cant trend towards an overall enrichment in pericytes (Extended data Fig. 7).We also tested enrichment of our genes of interest in different microglial states (Methods, Extended data Fig. 7), given the aforementioned results observed with STEAP, and observed signi cant enrichment in a microglial state type previously found to be itself enriched in processes such as ribosome biogenesis, amyloid bril formation, and positive regulation of Tcell mediated immunity 29 .

Proteomics-driven drug discovery
We used MR estimates from the 49 CSF and plasma proteins with MRI-cSVD to support drug discovery.
Using public drug databases (Methods), we curated drugs (commercialized for other indications or under investigation in clinical trials) targeting these proteins in a direction compatible with bene cial therapeutic effects against cSVD based on MR estimates.We identi ed such drugs for EPO, LTF, TFPI, and EPHB4 for WMH; COL6A1, GPNMB, PCSK9 for PVS, most of which were associated with MRI-cSVD in the CSF only, except EPHB4 (plasma), EPO and TFPI (CSF and plasma, Fig. 5, Supplementary Table 28).Some of these proteins have predicted or experimentally proven interactions with each other (Fig. 2E-F), suggesting that identi ed drugs may impact related pathways.Of note, drugs targeting EPO and LTF as agonists and EPHB4 as inhibitors cross the blood-brain barrier (Supplementary Table 28).
Results of protein-cSVD associations along with clinical signi cance, pathway or cell-type enrichment and drug target identi cation are summarized in Supplementary Table 18 and Fig. 6.

Discussion
By combining a unique CSF and plasma pQTL resource with the latest GWAS of MRI-cSVD in a Mendelian randomization framework, we describe a new biological ngerprint of cSVD comprising 49 protein-cSVD associations with a putative causal relation, predominantly in the CSF.To assess robustness and speci city of our ndings we implemented a multipronged follow-up approach, across uids, proteomic platforms, and ancestries, which included testing associations of direct plasma protein measurements with MRI-cSVD.We highlight 16 proteins robustly associated in both CSF and plasma, of which 12 are in the same direction, while 24 and four proteins were identi ed in CSF or plasma only, with no evidence for association in the other uid.Strikingly, several cSVD-associated proteins already showed associations with WMH and PVS burden at age 20 with consistent directionality.The fact that half of cSVD-associated proteins show at least nominally signi cant associations with stroke, AD, or both highlights their clinical relevance.Pathway and cell-type enrichment analyses suggest an important role of extracellular matrix and immune response pathways, with single-cell RNA-sequencing analyses pointing predominantly to microglia, but also oligodendrocytes, vascular smooth muscle cells and pericytes.Finally, besides revealing potential novel biomarkers and drug targets to be investigated, our ndings also provide genetic support for repositioning of seven drugs for cSVD.
Previous explorations of cSVD proteomics were mainly conducted on focused protein panels 30,31 , mostly in plasma [14][15][16][17]32 (except a recent study on 16 CSF proteins) 33 and in relatively small cohorts (usually N < 1,000) 34 . Hee we analyzed over 2,500 plasma and CSF proteins in relation with WMH and PVS burden in over 40,000 participants.In recent years, CSF biomarkers have emerged as pivotal for unraveling the intricate mechanisms underlying neurodegenerative and neuroin ammatory diseases, given their proximity to the central nervous system [35][36][37] .Our ndings suggest that this also holds true for cSVD. Ineed, CSF-based MR analyses revealed ve times more protein-cSVD associations than plasma-based MR, despite ten times smaller sample size to derive pQTL.Among proteins with pQTL available in both plasma and CSF resources, 67% of cSVD-associated plasma proteins also showed associations with the same MRI-cSVD markers in CSF, whereas only 24% of cSVD-associated CSF proteins showed associations in plasma.Even when accounting for follow-up with direct protein measurements, only 43% of cSVD-associated CSF proteins were associated with MRI-cSVD in plasma, suggesting that some protein-cSVD associations are speci c to CSF, as described for other neurological disorders 12,13 .
Some proteins associated with MRI-cSVD were particularly robust, with consistent directionalities of their association across uids and platforms, using both pQTL-based and direct measurements, especially, PILRA-deltaTM, PILRA-M14, ARSB and CTSB.
PILRA (paired immunoglobin like type 2 receptor alpha) is a microglial immunoreceptor involved in βamyloid uptake and herpes simplex virus 1 infection 38 .Somascan measures soluble PILRA isoforms lacking the transmembrane domain 39 (PILRA-deltaTM and PILRA-M14) while Olink detects the full protein.Higher genetically determined CSF levels of PILRA-M14 and PILRA-deltaTM were associated with larger WMH volume across the lifespan, notably already in young adults in their twenties.In contrast, higher genetically determined CSF and plasma levels of PILRA isoforms were associated with smaller WM-PVS burden and lower risk of AD (p < 10 − 23 for high CSF levels).Higher plasma levels of PILRA (Olink direct measurements) were also protective for WM-PVS.This could potentially indirectly point to a protective effect of PILRA on cSVD caused by cerebral amyloid angiopathy (CAA), as WM-PVS was recently proposed as a novel CAA biomarker 40 , and CAA is associated with a strongly increased risk of AD 41 .Interestingly, previous experimental work has supported PILRA as the likely causal gene at the chr7q21 AD risk locus, 42 suggesting that a common missense variant in this gene (rs1859788, r 2 = 0.3 with PILRA pQTL) may protect against AD via reduced inhibitory signaling in microglia and reduced microglial infection during HSV-1 recurrence.The opposite effect we observed on WMH is intriguing, requiring further explorations, such as an examination of differential associations with WMH spatial patterns.
ARSB (arylsulfatase B) plays an important role in ECM degradation, regulation of neurite outgrowth and neuronal adaptability in the central nervous system 43 , where it is expressed predominantly in the microglia 44,45 .ARSB de ciency causes a lysosomal storage disorder (mucopolysacharidosisc) 46 .Here higher ARSB levels in CSF and plasma were associated with greater WMH volume based on both Somascan pQTL and direct Olink protein measurements, making ARSB a compelling candidate to explore as a circulating cSVD biomarker.CTSB (cathepsin B) is a cerebrovascular matrisome-associated protein identi ed in brain microvessels 31 .This lysosomal cysteine protease is involved in proteolysis of ECM components and enhanced vessel wall permeability 47 , as well as in proteolysis of amyloid precursor protein, implicated in AD 48 .Genetically determined higher CTSB levels in CSF were associated with smaller WMH and BG-PVS burden, replicating in plasma, across platforms (pQTL and direct measurements) and ancestries, and with lower AD risk at nominal signi cance.Similar associations were observed between higher genetically determined CSF and plasma levels of CTSS (cathepsin S, another cysteine protease) and smaller WMH volume, but higher direct plasma CTSS measurements were associated with larger WMH volume.A potential explanation for such discrepancies could be that pQTL and direct measurements capture different isoforms (Olink assays have been developed for the canonical CTSS isoform 1).Noteworthy, rare mutations in CTSA (encoding cathepsin A, a serine protease like HTRA1) cause a rare monogenic autosomal recessive cSVD known as CARASAL 49 .Our ndings thus expand the involvement of cathepsins to complex cSVD, and to cysteine in addition to serine proteases.
We also show for the rst time an association of lower plasma levels of HTRA1 (High-Temperature Requirement A serine peptidase 1), another cerebrovascular matrisome protein, with extensive HIP-PVS, consistent with loss-of-function mechanisms underlying monogenic cSVD caused by rare mutations in HTRA1 (CARASIL, autosomal dominant HTRA1 mutations) 50 .Rare and common variants at HTRA1 have been associated with larger WMH volume and increased stroke risk in the general population 51,52 , with recent ndings suggesting loss-of-function mechanisms through both reduced HTRA1 expression and lower serine protease enzyme activity.The association of lower genetically determined plasma levels of HTRA1 with extensive HIP-PVS provides additional evidence for an impact of HTRA1 loss-of-function on brain health.Interestingly, lower genetically determined HTRA1 plasma protein levels were also associated with higher risk of stroke (any, ischemic) and AD at p < 0.05.Overall, our proteogenomic analyses lend support to a prominent role of the cerebrovascular matrisome (extracellular matrix and associated proteins) in both monogenic and multifactorial cSVD, corroborating and expanding ndings from large genomic studies 5,6 and preclinical work on monogenic cSVD models 31 .In parallel, our ndings also reveal prominent associations of immune response pathways with MRI-cSVD.Intriguingly, associations with proteins involved in immunity and in ammation (with PILRA, TLR1, GPNMB, all three expressed predominantly in microglia) were already detectable in young adults in their twenties.We also found expression of genes encoding CSF cSVD-associated proteins to be signi cantly enriched in microglial cells, the brain's primary resident immune cells.The interplay between cSVD and in ammation has gained recent interest, with emerging evidence from focused biomarker studies and experimental models, suggesting that activation of immune cells and in particular microglial cells could play an important role [53][54][55][56][57] .Co-registration of MRI images with (immuno-)histopathological data has shown that WMH volume was associated with higher microglial activation, supporting that the latter could be involved in cSVD etiology 58 .Our results lend further support to this, suggesting that this could be one of the earliest processes involved, as demonstrated for AD 59 .Given growing evidence that changes in microglial transcriptional pro les play a crucial role in brain aging and AD and that blood proteins can mediate neurotoxic microglial functions 60 , the proteogenomic signature we describe might contribute to revealing biological underpinnings of the intricate relation between cSVD and AD 29,61,62 .Some cSVD-associated proteins are encoded by genes in cSVD GWAS loci, strengthening evidence for their involvement in disease pathogenesis.At chr17q21, lower plasma levels of NMT1 (N-Myristoyltransferase1), a protein involved in vascular instability and endothelial cell damage 63-65 , were associated with larger WMH volume, aligning with prior associations of lower arterial NMT1 expression with larger WMH burden 66 .At chr2p16, lower plasma levels of FBLN3 (Fibulin-3, encoded by EFEMP1), a glycoprotein essential for maintaining ECM and vessel integrity and involved in cell proliferation and migration, were associated with larger WMH volume 23,67 .Furthermore, beyond genetic risk scores derived from cSVD GWAS, genetic risk scores for cSVD-associated proteins may have added predictive value for identifying those with extensive cSVD burden, highlighting the potential of multiomics approaches for enhancing risk prediction and strati cation.This work further unveiled new prospects for therapeutic repositioning and development, with the identi cation of seven drugs (targeting EPO, LTF, TFPI, EphB4, COL6A1, GPNMB, and PCSK9) with cSVD MR results compatible with potential bene cial therapeutic effects, warranting further investigation.Of these, agonists for EPO and LTF and inhibitors of EphB4, which are either approved or studied in phase II clinical trials for other indications (Supplementary Table 28) present evidence of successfully crossing the blood brain barrier (BBB), although it is unclear whether this is required to treat cSVD.EPO is a neuroprotective protein safeguarding the BBB against VEGF-induced permeability 68 , acting through the Keap1/Nrf2 pathway in ischemia reperfusion injury 69 .LTF has anti-in ammatory and neuroprotective properties and can upregulate EPO 69 and downregulate IL-6 70,71 , both associated with MRI-cSVD in our study.EPO and LTF were reported to show strong protein-protein interaction with collaborative antiin ammatory properties 69 and modi ed, optimized versions of both these proteins have been tested experimentally as neuroprotective agents in ischemic stroke and intracerebral hemorrhage, and, for some, patented (WO2006120030A1) [72][73][74] .Erythropoietin-producing hepatocellular receptor B4 (EphB4), a tyrosine kinase receptor expressed in vascular endothelial cells, plays a crucial role in vascular development and adult vascular biology, in uencing blood vessel permeability, in ammation, and angiogenesis through interaction with the Notch pathway 75 .Drugs inhibiting PCSK9, COL6A1, or GPNMB and enhancing TFPI may hold promise for cSVD as well (Supplementary Table 28).PCSK9 is a convertase strongly linked to lipid homeostasis but also involved in neuronal apoptosis, neurogenesis, and brain in ammation 76 .Elevated PCSK9 levels have been associated with ischemic stroke (plasma) and AD (CSF) 76 .A protective effect of PCSK9 inhibitors on ischemic stroke has been demonstrated 77 .More recently, PCSK9 was shown to regulate amyloid beta clearance from the brain and peripheral PCSK9 inhibition reduced Aβ pathology in prefrontal cortex and hippocampus in mice 78 .Here, the robust association of high PCSK9 levels with larger WM-PVS burden, both in Europeans (CSF, Somascan pQTL) and East-Asians (plasma Somascan direct measurements), could suggest an association with the CAA subtype of cSVD 40 , characterized by Aβ deposition in the brain vasculature.The bi-directional MR result suggesting not only a putative causal association of higher PCSK9 levels with WM-PVS, but also an association of larger genetically determined WM-PVS burden with higher CSF PSCK9 levels is intriguing.Extensive WM-PVS burden is believed to re ect underlying glymphatic dysfunction, involved in impaired clearance of amyloid beta, but also other substances from the brain 79 .
Strengths of our study include the large-scale proteogenomics approach in plasma and CSF, using a Mendelian randomization framework that provides evidence for potential causality.The multipronged follow-up strategy across uids and platforms strongly enhances the robustness of our ndings.Although limited by smaller sample size, the extension across the lifespan and to East-Asian ancestry groups is unique and provides crucial insights on early life mechanisms underlying cSVD, while enabling transportability of ndings to East-Asian populations where cSVD is particularly prevalent 80 .We acknowledge limitations.pQTL were derived from a population enriched in neurologically impaired individuals (especially AD patients), however we previously showed that pQTL are only marginally in uenced by disease status 12 ; moreover, follow-up samples were not enriched in AD patients.Although we have used the largest available commercial panel, discovery was limited to proteins quanti ed by Somascan, for which valid pQTL instruments could be derived, representing less than 10% of known proteins (without accounting for isoforms).We had no available sample for following up associations in the CSF, given the scarcity of CSF proteomics resources, and the fact that lumbar puncture is typically not done in the context of cSVD.Non-signi cant follow-up of associations discovered using Somascan pQTL with Olink direct plasma protein measurements may re ect spurious ndings but also lack of power or modest correlation across platforms due to distinct technology.Inconsistent directionality of some signi cant associations between pQTL analyses and direct measurements or between both platforms requires further exploration but could re ect that distinct isoforms are being captured.Overall, these complexities highlight the importance of multiple follow-up and validation steps when interpreting association results from high-throughput proteomics assays.

Conclusion
Our work provides an extensive, rst in vivo biological ngerprint of cSVD derived from large-scale proteogenomics studies in CSF and blood.The results highlight important biological processes underlying cSVD at the molecular and cellular levels, pointing to shared pathways between cSVD and AD of potential therapeutic relevance and early life mechanisms involving immunity and in ammation.This proteogenomic signature paves the way for deriving circulating biomarkers and exploring drug development and repositioning opportunities.The Genome Research @ Ace Alzheimer Center Barcelona project (GR@ACE) is supported by Grifols SA, Fundación bancaria 'La Caixa', Ace Alzheimer Center Barcelona and CIBERNED.Ace Alzheimer Center Barcelona is one of the participating centers of the Dementia Genetics Spanish Consortium (DEGESCO).The FACEHBI study is supported by funds from Ace Alzheimer Center Barcelona, Grifols, Life Molecular Imaging, Araclon Biotech, Alkahest, Laboratorio de análisis Echevarne and IrsiCaixa.Authors acknowledge the support of the Spanish Ministry of Science and Innovation, Proyectos de Generación de Conocimiento grants PID2021-122473OA-I00, PID2021-123462OB-I00 and PID2019-106625RB-I00. ISCIII, Acción Estratégica en Salud, integrated in the Spanish National R+D+I Plan and nanced by ISCIII Subdirección General de Evaluación and the Fondo Europeo de Desarrollo Regional (FEDER "Una manera
Additional support was provided through R01AG023629 from the National Institute on Aging (NIA).A full list of principal CHS investigators and institutions can be found at CHS-NHLBI.org.This research has been conducted using the UK Biobank Resource under applications no.94113 and no.18545.used as instrumental variables (IVs).We extracted the association estimates between the variants and the exposures or the outcomes and aligned the effect alleles using the TwoSampleMR R package.
For proteins with multiple IVs we computed MR estimates with random-effect Inverse Variance Weighted (IVW) analysis 83 that rely on distinct assumptions for validity: (i) Heterogeneity across the MR estimates was assessed for each instrument using Cochran's Q statistic (p<0.05 was considered signi cant) 83 ; (ii) Horizontal pleiotropy was assessed using MR-Egger intercept as a measure of directional pleiotropy (p<0.05 was considered signi cant) 84 .We further conducted various sensitivity analyses 85 : 1.The identi cation of outlier IVs and their removal from analyses was conducted using MR Pleiotropy residual Sum and Outlier (MR-PRESSO) 86 (p<0.05 was considered signi cant) 2. Reverse MR was run by reversing the direction of inference, using the MRI-cSVD markers as the exposure and proteins as the outcome, to formally rule out reverse causation. 87and Weighted median that are more robust to the use of pleiotropic instruments were used as sensitivity analyses.When pleiotropy was observed, we retained results when at least 2 of the 3 sensitivity methods (MR-Egger, Weighted median, MR-PRESSO) were concordant with each other and p<0.05.

MR-Egger regression
For proteins with a single IV we computed MR estimates using the Wald ratio.MR analyses were followed by colocalization analyses using coloc 88 including variants ±1Mb surrounding the pQTL of interest.Associations were considered signi cant when the posterior probability H4 (PPH4; shared association with single causal variant) was >0.70 and suggestive for PPH4<0.70 when posterior probability H3 (PPH3; shared association with different causal variant)<0.70 89 .Associations with PPH4<0.70 and PPH3 >0.70 were removed from further analyses.
Discovery MR results were considered signi cant when passing the FDR Benjamini-Hochberg corrected signi cance threshold (P FDR <0.05).In sensitivity analyses we additionally corrected for the number of independent phenotypes tested, estimated using correlations between traits in the 3C study applying the Matrix Spectral Decomposition (matSpDlite 90 ) method for WMH volume and each PVS location, (p FDR <1.2x10 -2 ; 0.05/4).

Genetic correlation of identi ed protein-cSVD
Genetic correlations were performed using LDSC to identify proteins that may have a shared genetic basis leveraging pQTL summary statistics of the 45 proteins identi ed in CSF and 9 proteins identi ed in plasma.Only proteins with heritability greater than 20% could be used (N CSF =24, N plasma =9).(p<5x10 -5 was used correcting for the mean of proteins tested and 3 situations: CSF-CSF, CSF-plasma and plasmaplasma; 0.05/18*18*3) Follow-up of signi cant protein-cSVD associations We conducted multivariable linear and logistic regression of individual proteins with WMH volume and PVS burden adjusted for the delay between age at blood draw and age at the time of MRI, sex, batch effect, total intracranial volume (or mask volume for WMH in 3C-Dijon).WMH volume and PVS burden in basal ganglia and white matter were inverse normal transformed and PVS in hippocampus values were dichotomized, comparing participants in the top quartile of PVS burden distribution to the rest, as previously described 5 .An inverse variance weighted meta-analysis was performed using metafor R package 98 to combine 3C-Dijon and UKB association analyses.The heterogeneity of associations across studies was assessed using the Cochran-Mantel-Haenszel statistical test, only associations with p>1.9x10 -3 (0.05/26, correcting for 26 proteins available for follow-up) were considered.Signi cant associations were de ned by p FDR <0.05.In addition, results of sensitivity analyses at p FDR <1.2x10 -2 are displayed, accounting for the 4 phenotypes tested.
Correlation analyses between protein levels were conducted in UKB (the largest of the two samples) using the corrplot 99 R package.Correlations were de ned as signi cant at the Bonferroni corrected pvalue threshold of p<7.7x10 -5 (0.05/(26*26)-26).
Cross-ancestry follow-up (direct protein measurements, Somascan, plasma) Brain imaging and plasma proteomic data from the Nagahama study, a prospective population-based cohort study initiated in 2007 in Nagahama, Japan (N=10,082 at baseline) were used 100 .Healthy participants (without serious physical impairment and heath issue) aged 30 to 74 years were recruited between 2008 and 2010 from the general population of Nagahama (Japan) and followed-up 5 years after baseline between 2013 and 2015.Plasma proteomic measurements have been conducted on a subset of 2,000 individuals using Somascan 4.0.Of those, 858 had brain MRI measurements.WMH in Nagahama was generated using UBO detector 101 , a publicly available automated tool which extracts features from T1w and FLAIR input images, such as relative intensity levels, tissue probability, and anatomical location, to classify FLAIR hyperintensities as WMH using k-Nearest Neighbor algorithm.A trained rater reviewed visual quality control report generated by the tool to reject gross failures in tissue probability estimates and WMH classi cation.PVS burden was estimated using the aforementioned machine-learning based SHIVA-PVS algorithm 5,96 .QC checks and proteomic measurements transformation (log2) were conducted according to standardized Somascan protocols.After excluding participants for whom the estimation of the MRI-marker was not possible, without proteomics measurements passing QC, with prevalent stroke, missing covariates, or who had withdrawn their consent, a total of 785 participants were available for association analyses.We conducted linear regression for WMH, WM-PVS and BG-PVS as continuous variables inverse normal transformed adjusted for age, sex, batch, total intracranial volume and the rst 4 principal components.Signi cant associations at p FDR <0.05 were reported.Given the exploratory nature of these cross-ancestry analyses on a much small sample size, associations at p<0.05 were also reported.
Follow-up across the lifespan (pQTL, Somascan, plasma and CSF) Association analysis with extreme-cSVD We investigated the ability of protein-GRS to predict extremes of cSVD severity (extreme-cSVD) in the 3C-Dijon cohort 92 .Brie y, after removing individuals with prevalent stroke, dementia, or brain tumor, we de ned a binary phenotype for extreme-cSVD in 1,497 participants with MRI and genome-wide genotype data (N=58 extensive, with WMH burden in the top quartile of the cohort distribution ± presence of lacunes; 253 minimal-cSVD, with WMH burden in the bottom quartile of the cohort distribution and no lacunes or other types of brain infarcts, Supplementary methods).
We performed logistic regression of each of the protein-GRS with extreme-cSVD as the dependent variable, adjusting for the rst 5 principal components for population strati cation 110 .We also used a previously derived WMH GRS (weighted sum of independent genome-wide signi cant risk variants for WMH volume), a strong genetic predictor of WMH volume, for comparison 20 .The number of SNPs in each GRS is included in Supplementary Table 19.We found ve genetically determined CSF and plasma proteins nominally associated with extreme-cSVD, although none remained signi cant after Bonferroni-correction for the 49 protein-GRS (p<0.001).

Prediction Performance
We assessed the performance of these 5 protein-GRS to predict extreme-cSVD, individually and combined, in models adjusted for the rst 5 principal components and WMH-GRS: CSF.Cystatin-M, PLASMA.PILRA-M14, CSF.PPAC, PLASMA.PILRA-deltaTM, and CSF.TLR1-ECD.As PILRA isoforms were extremely correlated (r=0.99),we selected the isoform displaying the strongest association with extreme cSVD (PILRA-M14) for the combined model.Prediction performance was evaluated in 3C-Dijon through internally validated AUC using the optimism bootstrap estimator in the caret R package (2,000 bootstrap replications) 111 .
Clinical signi cance explore the relation of genetically determined protein levels with clinical complications of cSVD, we used summary statistics of the largest available GWAS (European ancestry subset) for stroke and dementia.Summary statistics for any stroke, ischemic stroke, and small vessel stroke were derived from the GIGASTROKE study (comprising 73,652 patients with any stroke, 62,100 with ischemic stroke, and 6,811 with small vessel stroke 112 ) and the largest publicly available GWAS for intracerebral hemorrhage (ICH, 1,545 patients 113 ).For dementia we used summary statistics of the largest GWAS for Alzheimer's disease comprising 71,880 AD cases 114 .
Following the steps of instrument selection and MR described above, we performed two-sample MR to test the relation of each genetically determined levels (in plasma and CSF) of the 49 cSVD associated proteins with stroke (subtypes) and dementia.To capture trends towards clinical signi cance we considered associations at p<0.05 and reported signi cant ndings after multiple testing correction at p FDR <0.05.

Cross-ancestry
To assess the causal association between serum protein levels of cSVD-associated proteins and stroke, in individuals of East-Asian ancestry, we conducted two-sample MR analyses in BioBank Japan (BBJ, rst cohort study 115 ), which recruited around 200,000 participants with at least one of 47 target diseases across 66 hospitals in Japan between 2003 and 2007.. Proteomic pro ling was conducted for a total of 2,886 individuals of East-Asian ancestry from two previous studies 116, 117 with whole genome sequencing datasets, using the Olink Explore 3072 panel following the manufacturer's protocol.Data pre-processing, including intensity normalization, bridge normalization across batches, and QC, was conducted according to standardized Olink protocols.Rank-based inverse normal transformation was applied to protein level measurements before association tests.pQTL summary statistics of serum protein levels were obtained for 19 available proteins (out of the 49 cSVD-associated proteins from the discovery analysis) by meta-analyzing summary statistics generated in individuals from each study separately using REGENIE v3.2.9 107 (adjusted for age, sex, age 2 ,age*sex, age 2 *sex, batch, and the rst 10 genotype principal components) and METAL 118 (inverse variance weighted method; xed effect model).Summary statistics of GWAS for ischemic stroke (N=17,493), large-artery atherosclerotic stroke (N=1,322), cardioembolic stroke (N=747), and small vessel stroke (N=4,876) were obtained in the BBJ rst cohort using REGENIE v3.2.9 (adjusted for age, sex, and the rst 10 genotype principal components), excluding the samples used for proteomic pro ling.Genotyping, quality control, and imputation for BBJ samples used in the stroke GWASs were conducted as previously described 119 , except that the imputation was performed using a reference panel combining the 1000 Genome Project phase 3 v5 reference panel and 3256 Japanese samples (JEWEL3k) samples 120 .Individuals without any type of stroke or cerebral aneurysm were used as controls.Instrument selection and MR were conducted following the methods previously described (p-threshold for clumping: 1x10 -6 , methods)

Biological interpretation
Protein-protein interactions Protein-protein interactions were analyzed using the STRING database with the initial set of 1,121 proteins for CSF and 2,805 for plasma as background.

Proteomics driven drug discovery
Using signi cant MR results from and plasma, we restricted our analysis to drug-targeting proteins using 4 drug-gene databases (ChEMBL, pharmGKB, DrugBank and TTD).Following this methodology, eight drug-targeting proteins were identi ed for WMH (EPO, LTF, TFPI, APOE, ARSB, CTSS, CTSB and EPHB4) and seven for PVS (COL6A1, CTSB, GPNMB, PCSK9, FcRIIIA, Heparin co-factor II, IL6).Using public drug databases, we then curated drugs targeting those proteins in a direction compatible with a bene cial therapeutic effect against the corresponding cSVD phenotype based on MR estimates.The desired mode of action (MoA) was de ned as the opposite direction of the MR estimate.Once the drugs were identi ed, we searched the literature for a potential action of the drug.

Declarations
This project is supported by a grant overseen by the French National Research Agency (ANR) as part of the "Investment for the Future Programme" ANR-18-RHUS-0002 and by the Precision and Global Vascular Brain Health Institute (VBHI) funded by the France 2030 IHU3 initiative.The project also received funding from the French National Research Agency (ANR) through the SHIVA project.Computations were performed on the Bordeaux Bioinformatics Center (CBiB) and the CREDIM computer resources, University of Bordeaux.Funding support for additional computer resources has been provided to S.D. by the Fondation Claude Pompidou.The i-Share study has received funding by the French National Agency (Agence Nationale de la Recherche, ANR), via the Investment for the Future program (grant nos.ANR-10-COHO-05 and ANR-18-RHUS-0002) and from the University of Bordeaux Initiative of Exellence (IdEX).The Three City (3C) Study is conducted under a partnership agreement among the Institut National de la Santé et de la Recherche Médicale (INSERM), the University of Bordeaux, and Sano -Aventis.The Fondation pour la Recherche Médicale funded the preparation and initiation of the study.The 3C Study is also supported by the Caisse Nationale Maladie des Travailleurs Salariés, Direction Générale de la Santé, Mutuelle Générale de l'Education Nationale (MGEN), Institut de la Longévité, Conseils Régionaux of Aquitaine and Bourgogne, Fondation de France, and Ministry of Research-INSERM Programme "Cohortes et collections de données biologiques."

Figure 3 Summary
Figure 3

Figure 4 Clinical
Figure 4